case study : Modelling the Spatial Variation of the Explanatory Factors of Water Point Status using Geographically Weighted Logistic Regression (GWLR).
1. OVERVIEW
This study focuses on GWLR analysis based on Nigeria’s water points attributes.
1.1 Objectives
To build an explanatory model to discover factor affecting water point status in Osun State, Nigeria.
1.2 Study Area
Osun State, Nigeria
2. R PACKAGE REQUIRED
The following are the packages required for this exercise :
2.1 Load R Packages into R Environment
Usage of the code chunk below :
p_load( ) - pacman - to load packages. This function will attempt to install the package from CRAN or pacman repository list if its found not installed.
Installing package into 'C:/Users/joeta/AppData/Local/R/win-library/4.2'
(as 'lib' is unspecified)
Warning: package 'h20' is not available for this version of R
A version of this package for your version of R might be available elsewhere,
see the ideas at
https://cran.r-project.org/doc/manuals/r-patched/R-admin.html#Installing-packages
Warning: unable to access index for repository http://www.stats.ox.ac.uk/pub/RWin/bin/windows/contrib/4.2:
cannot open URL 'http://www.stats.ox.ac.uk/pub/RWin/bin/windows/contrib/4.2/PACKAGES'
Warning in p_install(package, character.only = TRUE, ...):
Warning in library(package, lib.loc = lib.loc, character.only = TRUE,
logical.return = TRUE, : there is no package called 'h20'
Warning in pacman::p_load(sf, tidyverse, funModeling, blorr, corrplot, ggpubr, : Failed to install/load:
h20
3. GEOSPATIAL DATA
3.1 Acquire Data Source
Aspatial Data
Osun_wp_sf.rds, contained water points within Osun state.
It is in sf point data frame.
Geospatial Data
Osun.rds, contains LGAs boundaries of Osun State.
It is in sf polygon data frame
3.2 Import Data
3.2.1 Import Boundary RDS File
bdy_osun <-read_rds("data/geodata/Osun.rds")
3.2.1.1 review imported data
skim(bdy_osun)
Warning: Couldn't find skimmers for class: sfc_MULTIPOLYGON, sfc; No user-
defined `sfl` provided. Falling back to `character`.
Data summary
Name
bdy_osun
Number of rows
30
Number of columns
5
_______________________
Column type frequency:
character
5
________________________
Group variables
None
Variable type: character
skim_variable
n_missing
complete_rate
min
max
empty
n_unique
whitespace
ADM2_EN
0
1
3
14
0
30
0
ADM2_PCODE
0
1
8
8
0
30
0
ADM1_EN
0
1
4
4
0
1
0
ADM1_PCODE
0
1
5
5
0
1
0
geometry
0
1
1805
7898
0
30
0
3.2.2 Import Attribute RDS
wp_osun <-read_rds("data/geodata/Osun_wp_sf.rds")
3.2.2.1 review imported data
skim(wp_osun)
Warning: Couldn't find skimmers for class: sfc_POINT, sfc; No user-defined `sfl`
provided. Falling back to `character`.
Data summary
Name
wp_osun
Number of rows
4760
Number of columns
75
_______________________
Column type frequency:
character
47
logical
5
numeric
23
________________________
Group variables
None
Variable type: character
skim_variable
n_missing
complete_rate
min
max
empty
n_unique
whitespace
source
0
1.00
5
44
0
2
0
report_date
0
1.00
22
22
0
42
0
status_id
0
1.00
2
7
0
3
0
water_source_clean
0
1.00
8
22
0
3
0
water_source_category
0
1.00
4
6
0
2
0
water_tech_clean
24
0.99
9
23
0
3
0
water_tech_category
24
0.99
9
15
0
2
0
facility_type
0
1.00
8
8
0
1
0
clean_country_name
0
1.00
7
7
0
1
0
clean_adm1
0
1.00
3
5
0
5
0
clean_adm2
0
1.00
3
14
0
35
0
clean_adm3
4760
0.00
NA
NA
0
0
0
clean_adm4
4760
0.00
NA
NA
0
0
0
installer
4760
0.00
NA
NA
0
0
0
management_clean
1573
0.67
5
37
0
7
0
status_clean
0
1.00
9
32
0
7
0
pay
0
1.00
2
39
0
7
0
fecal_coliform_presence
4760
0.00
NA
NA
0
0
0
subjective_quality
0
1.00
18
20
0
4
0
activity_id
4757
0.00
36
36
0
3
0
scheme_id
4760
0.00
NA
NA
0
0
0
wpdx_id
0
1.00
12
12
0
4760
0
notes
0
1.00
2
96
0
3502
0
orig_lnk
4757
0.00
84
84
0
1
0
photo_lnk
41
0.99
84
84
0
4719
0
country_id
0
1.00
2
2
0
1
0
data_lnk
0
1.00
79
96
0
2
0
water_point_history
0
1.00
142
834
0
4750
0
clean_country_id
0
1.00
3
3
0
1
0
country_name
0
1.00
7
7
0
1
0
water_source
0
1.00
8
30
0
4
0
water_tech
0
1.00
5
37
0
20
0
adm2
0
1.00
3
14
0
33
0
adm3
4760
0.00
NA
NA
0
0
0
management
1573
0.67
5
47
0
7
0
adm1
0
1.00
4
5
0
4
0
New Georeferenced Column
0
1.00
16
35
0
4760
0
lat_lon_deg
0
1.00
13
32
0
4760
0
public_data_source
0
1.00
84
102
0
2
0
converted
0
1.00
53
53
0
1
0
created_timestamp
0
1.00
22
22
0
2
0
updated_timestamp
0
1.00
22
22
0
2
0
Geometry
0
1.00
33
37
0
4760
0
ADM2_EN
0
1.00
3
14
0
30
0
ADM2_PCODE
0
1.00
8
8
0
30
0
ADM1_EN
0
1.00
4
4
0
1
0
ADM1_PCODE
0
1.00
5
5
0
1
0
Variable type: logical
skim_variable
n_missing
complete_rate
mean
count
rehab_year
4760
0
NaN
:
rehabilitator
4760
0
NaN
:
is_urban
0
1
0.39
FAL: 2884, TRU: 1876
latest_record
0
1
1.00
TRU: 4760
status
0
1
0.56
TRU: 2642, FAL: 2118
Variable type: numeric
skim_variable
n_missing
complete_rate
mean
sd
p0
p25
p50
p75
p100
hist
row_id
0
1.00
68550.48
10216.94
49601.00
66874.75
68244.50
69562.25
471319.00
▇▁▁▁▁
lat_deg
0
1.00
7.68
0.22
7.06
7.51
7.71
7.88
8.06
▁▂▇▇▇
lon_deg
0
1.00
4.54
0.21
4.08
4.36
4.56
4.71
5.06
▃▆▇▇▂
install_year
1144
0.76
2008.63
6.04
1917.00
2006.00
2010.00
2013.00
2015.00
▁▁▁▁▇
fecal_coliform_value
4760
0.00
NaN
NA
NA
NA
NA
NA
NA
distance_to_primary_road
0
1.00
5021.53
5648.34
0.01
719.36
2972.78
7314.73
26909.86
▇▂▁▁▁
distance_to_secondary_road
0
1.00
3750.47
3938.63
0.15
460.90
2554.25
5791.94
19559.48
▇▃▁▁▁
distance_to_tertiary_road
0
1.00
1259.28
1680.04
0.02
121.25
521.77
1834.42
10966.27
▇▂▁▁▁
distance_to_city
0
1.00
16663.99
10960.82
53.05
7930.75
15030.41
24255.75
47934.34
▇▇▆▃▁
distance_to_town
0
1.00
16726.59
12452.65
30.00
6876.92
12204.53
27739.46
44020.64
▇▅▃▃▂
rehab_priority
2654
0.44
489.33
1658.81
0.00
7.00
91.50
376.25
29697.00
▇▁▁▁▁
water_point_population
4
1.00
513.58
1458.92
0.00
14.00
119.00
433.25
29697.00
▇▁▁▁▁
local_population_1km
4
1.00
2727.16
4189.46
0.00
176.00
1032.00
3717.00
36118.00
▇▁▁▁▁
crucialness_score
798
0.83
0.26
0.28
0.00
0.07
0.15
0.35
1.00
▇▃▁▁▁
pressure_score
798
0.83
1.46
4.16
0.00
0.12
0.41
1.24
93.69
▇▁▁▁▁
usage_capacity
0
1.00
560.74
338.46
300.00
300.00
300.00
1000.00
1000.00
▇▁▁▁▅
days_since_report
0
1.00
2692.69
41.92
1483.00
2688.00
2693.00
2700.00
4645.00
▁▇▁▁▁
staleness_score
0
1.00
42.80
0.58
23.13
42.70
42.79
42.86
62.66
▁▁▇▁▁
location_id
0
1.00
235865.49
6657.60
23741.00
230638.75
236199.50
240061.25
267454.00
▁▁▁▁▇
cluster_size
0
1.00
1.05
0.25
1.00
1.00
1.00
1.00
4.00
▇▁▁▁▁
lat_deg_original
4760
0.00
NaN
NA
NA
NA
NA
NA
NA
lon_deg_original
4760
0.00
NaN
NA
NA
NA
NA
NA
NA
count
0
1.00
1.00
0.00
1.00
1.00
1.00
1.00
1.00
▁▁▇▁▁
3.3 Exploratory Data Analysis (EDA)
3.3.1 Plot Bar Chart
3.3.1.1 visualise “status”
wp_osun %>%freq(input ="status")
Warning: The `<scale>` argument of `guides()` cannot be `FALSE`. Use "none" instead as
of ggplot2 3.3.4.
ℹ The deprecated feature was likely used in the funModeling package.
Please report the issue at <https://github.com/pablo14/funModeling/issues>.
status frequency percentage cumulative_perc
1 TRUE 2642 55.5 55.5
2 FALSE 2118 44.5 100.0
3.3.1.2 visualise “status” by “water_tech_category”
Chua A. (2022). In-class Ex5: Modelling the Spatial Variation of the Explanatory Factors of Water Point Status using Geographically Weighted Logistic Regression. https://isss624-amelia.netlify.app/exercises/in-class_ex5/in-class_ex5